NGCN: Drug‐target interaction prediction by integrating information and feature learning from heterogeneous network

Abstract Drug‐target interaction (DTI) prediction is essential for new drug design and development. Constructing heterogeneous network based on diverse information about drugs, proteins and diseases provides new opportunities for DTI prediction. However, the inherent complexity, high dimensionality and noise of such a network prevent us from taking full advantage of these network characteristics. This article proposes a novel method, NGCN, to predict drug‐target interactions from an integrated heterogeneous network, from which to extract relevant biological properties and association information while maintaining the topology information. It focuses on learning the topology representation of drugs and targets to improve the performance of DTI prediction. Unlike traditional methods, it focuses on learning the low‐dimensional topology representation of drugs and targets via graph‐based convolutional neural network. NGCN achieves substantial performance improvements over other state‐of‐the‐art methods, such as a nearly 1.0% increase in AUPR value. Moreover, we verify the robustness of NGCN through benchmark tests, and the experimental results demonstrate it is an extensible framework capable of combining heterogeneous information for DTI prediction.

• The approach using molecular docking requires a known 3D structure of proteins, whereas the complex structures of known protein ligands are scarce and generally unavailable.
• The approach by ligand similarity employs the knowledge of known ligand interactions to make predictions.Nevertheless, if the target has insufficient ligands, the results may be poor.
• Machine learning is the most popular and effective approach at present, which can fully explore the relevant characteristics of drugs and the potential drug-target interactions.
In recent years, many machine learning-based methods have been proposed to predict potential DTIs.They mainly consist of the kernel method, matrix decomposition and multi-source information integration.
According to chemical and genomic information, Yamanishi et al. 6 used nuclear regression for DTI prediction and constructed a BLM model using bipartite graphs.Van Laarhoven et al. 7 defined a gaussian interactive section core depending on the topological characteristics of the adjacency matrix and then used the kernel least squares (KRLS) algorithm to predict DTIs.Pahikkala et al. 8 also employed the Kronecker regularized least squares (KRLS) algorithm, but they utilised the drug characterization based on 2D compound similarity and the Smith-Waterman similarity characterization of the target.The kernel-based methods only employ simple linear combinations, relying on several individual kernels to generate the final kernel matrix.This may be inappropriate if the linearity between the kernels is not obvious.
Matrix factorization is also widely used for DTI prediction.The dual-nucleated Bayesian matrix decomposition (KBMF2K) proposed by Gonen et al. 9 maps target proteins and drug compounds into the subspace of Bayesian by estimating the interaction network and using similarity in the subspace.Hao et al. 10 established a drug-target prediction model called DNILMF based on logical matrix decomposition.This model constructs two new kernel matrices, performs nonlinear diffusion between these two matrices and the two original similarity matrices, and predicts drug-target interactions by gathering neighbour information.Ding et al. 11 proposed a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs.Multi-kernel learning (MKL) algorithm can regulate the weight of each kernel matrix according to the prediction error.The aforementioned methods utilise direct drug-target associations.This is challenging because the known information about the interaction is often incomplete.
With the rapid development of bioinformatics, various drugs, proteins, genes and other types of data have also been adopted for DTI prediction.Wan et al. 12 constructed a large integrated network by combining data from multiple heterogeneous networks, captured the topological characteristics of the integrated network by using neighbourhood aggregation technology 13 and reconstructed the topological representation of all relational matrices.Yu et al. 14 developed an ensemble model (KenDTI) based on both biochemical characteristics of drugs via network integration and molecular sequences via word embedding to predict DTIs.Shao et al. 15 regarded DTI prediction as a link prediction problem and proposed an end-to-end model based on heterogeneous graphs with attention mechanisms (DTI-HETA).Fu et al. 16 proposed a multi-view graph convolutional network (MVGCN) framework for link prediction in biological networks by combining the similarity network to build a multi-view heterogeneous network and obtain node attributes.In addition, a Neighbourhood Information Aggregation (NIA) layer was designed for inter-and intra-domain information updating.Ren et al. 17  utilised to obtain the embedded representation of the drugs and targets.The performance of network prediction tasks using graph convolution technology for large-scale graph data has been significantly improved 18 owing to the application of graph neural networks. 19In multi-source data processing, it is usually easy to concatenate the features of different data sources.Therefore, how to make full use of the contributions of data from varied sources to efficiently fuse the DTI prediction is the key to improve the DTI prediction accuracy.
Motivated by the recent success of deep learning techniques in learning powerful representations from complex data, [20][21][22][23] Zhang et al. 24 introduced related datasets for DTI prediction.Excluding the previously mentioned self-supervised learning framework, MGPDR, introduced by Ren et al., 17 Chu et al. 25 proposed the model, HGRL-DTA, which was a novel approach for learning drug-target binding affinity prediction through hierarchical graph representation.By incorporating both global affinity relationships and local chemical structures of drugs/target molecules, and utilising message broadcasting strategies, the model can synergistically integrate hierarchical information.The heterogeneous graph automatic meta-path learning-based DTI prediction method (HampDTI), proposed by Wang et al., 26 employed a node-type specific graph convolutional network (NSGCN) to learn the embedding of drugs and targets using meta-paths learned from a heterogeneous graph.The embedding from multiple meta-path graphs has been combined to predict new DTIs.
The advantage of a deep learning method is its ability to identify hidden interactions between drugs and targets.However, they still have room for improvement in the following two aspects: (1) DTI prediction is to discover new DTIs.How to select truly interaction-free drug-target pairs is a thorny issue; (2)  networks and reduce the feature information of drug or target to a low-dimensional feature representation.Based on these lowdimensional feature vectors, the spectral graph-based convolutional neural (GCN) network is further applied to learn the drug or target features and avoid inaccuracy caused by the noise and incompleteness of large-scale biological data.We compare NGCN with other methods to demonstrate its effectiveness and gradually increase the number of networks to prove the integration capability of NGCN.The results demonstrate that NGCN is promising for drug-target interaction prediction.

| PRELIMINARIE S
Drug-target interaction prediction of network syncretic aims to conduct prediction tasks by jointly utilising different views to exploit the complementarity.
Recently, there have been significant efforts towards integrating heterogeneous information from multiple networks.They can be roughly divided into two types of processes: • Gather multiple networks to build a large integrated network and extract information for prediction.
• Extract feature information from each network and then fuse them for similarity or correlation prediction.
It is difficult to distinguish the discrepancies between different networks while constructing large integrated networks.And if the number of integrated networks is too large, computations on such a network will become challenging due to the increasing network complexity.
Extracting information from each network and making fusion predictions are the primary ways for drug-target interaction prediction.The process is mainly composed of three steps: (1)   extracting drug or protein information from each network; (2) feature fusion and dimensionality reduction; and (3) correlation prediction or drug relocation prediction based on extracted feature information.
Information extraction on a single network is the key step in network fusion.Common feature extraction consists of matrix decomposition and random walk with restart (RWR).The former usually decomposes the incidence matrix into two eigenvectors and minimises the loss of vector reconstruction.However, this strategy might lead to information loss and fail to capture the global characteristics of the incidence matrix.
As for RWR, a pre-defined restart probability is introduced into the random walk with restart to identify the direct or indirect relationship between nodes of network.Suppose A and D are adjacency matrix and diagonal matrix, respectively.D i,i = ∑ n j=1 A i,j , the one-step probability transition matrix Â can be yielded by normalising the adjacency matrix.
Next, we introduce a t-step RWR vector r t , and r t i means the probability of visiting node i after t step transitions.Let r 0 i be the n-dimensional initial one-hot vector.A RWR process is defined as: where p represents the probability of restart, and its value controls both global and local structural characteristics of the network.By iteratively executing the above process, we can get the diffusion state r i of the node, which is a high-level representation of the structural characteristics in the network.Given two nodes in a network, if they share similar diffusion states, it means these two nodes have similar neighbourhood characteristics in the network. 27

| ME THOD
The diffusion state is inaccurate, partially because the network data set in the experiment is noisy and incomplete.Luo et al. 27 improved the diffusion component analysis method (DCA) 28 and proposed the clusDCA for dimension reduction in the form of effective matrix decomposition.It is combined in our proposed model, NGCN, herein.
The NGCN first conducts the RWR process on each drug or protein within each similar network to acquire the distribution of each drug or protein node, termed as the diffusion state.The diffusion state captures its topological relationship with all other nodes in the heterogeneous network.Subsequently, the improved clusDCA algorithm is employed to compute the low-dimensional representation of the nodes.Leveraging the learned low-dimensional features of drugs and proteins (where each row in the low-dimensional drug features represents a feature vector of a drug and each column in the low-dimensional protein features is a feature vector of a protein), NGCN executes spectral graph convolution to further refine the features of drugs and proteins.Finally, the drug-target matrix is reconstructed to identify unknown drug-target interactions.Details of the NGCN model are depicted in Figure 1.

| Diffusion state of nodes by RWR
Our network data consists of homogeneous interaction networks, such as PPI network, and heterogeneous interaction networks, such as protein-disease association networks.For the input homogeneous interaction networks (e.g.drug-drug interaction networks), we compute the "diffusion state" of each drug or target by directly running the RWR algorithm on each of these networks.As for heterogeneous interaction networks, we need to build similarity networks (e.g. to build protein-protein similarity network through proteindisease association networks), perform the RWR on the derived similarity networks and then run the RWR process on these similarity networks to obtain the diffusion states of drugs or proteins.Overall, we construct similarity networks for drugs, based on (i) drug-drug (1) interactions, (ii) drug-disease associations and (iii) drug-side-effect associations.In the similar way, we construct similarity networks for proteins, based on (i) protein-protein interactions and (ii) proteindisease associations.
Further, we can use the Jaccard similarity coefficient to calculate similarity between drugs, which is based on common neighbours and the union of sets of all neighbours of the two drugs.
Given two nodes i and j, their similarity within a heterogeneous network is defined as: Then the diffusion state of each network can be obtained by running the RWR process on each similarity network, as described in Equ 2.

| Performing feature reduction and feature extraction
Owing to the data quality and dimensionality issues, the diffusion state of drugs and targets produced by RWR may be error-prone.
In particularly, in case of the integration of multiple networks, it is often inconvenient to implement topological features directly by using the high dimensionality of the diffusion state.To address these problems and obtain important topological feature information about nodes from the diffusion state, we adopt a new diffusion component analysis method (clusDCA 29 ) to perform feature reduction on diffusion state feature.Given node i , we model the probability assigned to node j in the diffusion state of node i as follows: (3) NGCN uses drug-protein association network, protein-protein association network, drug-drug interaction network, drug-disease network, protein-disease association network and drug-side effect network.We first obtain the diffusion state matrix (i.e. on each network to obtain a distribution of each drug or protein node, which captures its topological relations to all other nodes in the heterogeneous network) of each network through the RWR algorithm.The improved clusDCA algorithm is then used to calculate the low-dimensional representation of the nodes.We add spectral GCN to update the node feature before reconstructing the drug-target matrix.NGCN effectively learns topology-preserving node features that are useful for predicting drug-target interactions by enforcing the reconstruction of the original individual networks.Finally, the updated node properties are considered to reconstruct the drug-target matrix.
| 5 of 11 In order to reduce feature dimension more quickly and conveniently, clusDCA achieves rapid decomposition of the diffusion state via matrix decomposition.By modifying the formula, we have: To optimise the objective function, we use singular value decomposition (SVD) in this process.Let L represent the logarithmic diffusion state matrix of the network.We define the SVD of the matrix L as follows: where U, Σ, V ∈ R n×n .Let the low-dimensional feature matrix be In terms of SVD, we calculate X as follows: where U d represents the first d singular vectors and Σ 0.5 d is the 0.5 power of the first singular values.
To integrate heterogeneous network data, DCA of the above single network needs to be extended to a multi-network case.
More specifically, let L = L 1 , … , L K denote the set of logarithmic diffusion state matrices obtained through the diffusion states R c = S 1 , … , S K of K input networks.Then, the following objective function needs to be optimised: where w r j represents the network-specific feature of each node i in the network r, and the node feature x i is shared among all K networks.The above objective function can also be optimized by SVD.

| Updating feature information
Although we have obtained the low-dimensional representation of drug or target nodes, the node features need to be further updated due to the noisy and uncertain biological information.Here, we use the spectral graph-based convolutional neural network for updating features.
Given the node feature X (u) , u ∈ {drug, protein}, we update the features from each X (u) through spectral graph convolution to obtain a new representation of X (u) .For the similarity network of u ∈ {drug, protein}, we specify Ã(u) = A (u) + I N and diagonal matrix D(u) where We then apply spectral convolution to obtain a new representation of nodes feature H (u) : where , Ã = A + I N means the adjacency matrix combining self-connection, ( ⋅ ) represents a non-linear function like ReLU or sigmoid, and W (u) is a weight matrix.Therefore, the new representation H drug of the drugs can be obtained through the drug similarity matrix A (drug) and the drug feature X (drug) , and the new representation H protein of the protein can be obtained in the same way.

| Reconstructing drug-target matrix
According to the obtained drug and target characteristics, we need to reconstruct the drug-target matrix for the purpose of prediction.
Topology-preserving learning of the node embedding 12 is a proved good way to reconstruct the drug-target prediction matrix.Given n drug nodes and m protein nodes, the reconstructed DTIs matrix can be expressed as: where D r ∈ R d×n , P r ∈ R d×m are specific mapping matrices of drug and protein, m and n represent the number of drugs and proteins, respectively, and r means a protein interaction.
The above equation states that the values of the edge mapping of the drug features and the target features through the mapping functions D r and P r can be reconstructed by doing the inner product of the mapped vectors.Natarajan and Dhillon et al. 28 also used similar reconstruction strategies to solve the prediction problem.In the training process, the summation of the squared reconstruction errors of all edges is minimised by learning unknown parameters.So, given a drug-target edge weight vector Y, we define the reconstruction loss of the edge weight value as: (5) By minimising the final objective function, gradient descent training can be carried out.

| Pseudocode of NGCN
The pseudocode for NGCN is provided in Algorithm 1 below.

| Dataset
In the whole training process, the dataset of our experiment is the same as that used by Luo et al. 27 There are four types of nodes in the dataset including drug nodes, protein nodes, disease nodes and side effect nodes.There was no exception; those isolated nodes were excluded.
The dataset includes two kinds of similarity network and six types of association networks.The latter consists of drug-protein association network, 30 protein-protein association network, 31 drug-drug interaction network, 30 drug-disease network 32 and protein-disease association network 32 and drug-side effect network. 33These networks can be used to construct corresponding similarity networks with respect to proteins and drugs.Among them, the former is generated by the similarity of the gene sequence of proteins, and the latter is constructed by the similarity of the medical chemical structure.

| Superiority in DTI prediction
A drug-target pair with a interaction is considered a positive sample, and a drug-target pair with an unknown interaction is generally viewed as a negative sample.To measure the performance of NGCN in predicting DTIs, we first performed 10-fold crossvalidation on all positive pairs and a set of randomly sampled negative pairs, whose number was 10 times as many as that of positive samples.This scenario basically stimulated the practical situation in which the DTIs are sparsely labelled.For each fold, a randomly chosen subset of 90% positive and negative pairs was used as training data to construct the heterogeneous networks and then train the parameters of NGCN, and the remaining 10% positive and negative pairs were held out as the test set.
We compared NGCN with six baseline methods, including NeoDTI, 12 DTINet, 27 BLMNII, 34 MOLIERE, 35 NetLapRLS 36 and HNM. 37Two evaluation indicators including AUPR (the area under the precision-recall curve) and AUROC (the area under the receiver operating characteristic curve) were used to measure performance.
In Figure 2, we can observe that NGCN has better performance than other methods, which is higher than the best method.In addition to known DTI data, the chemical structure, protein sequence information and other properties of drugs and targets can also be determined through their various functional roles in biological systems, such as protein-protein interactions and drug-disease associations.By integrating disparate information from heterogeneous data sources, methods such as DTINet, NeoDTI and HNM can further improve the accuracy of DTI predictions.However, there are still some limitations to these approaches that need to be addressed.
For example, HNM method only considers three different types of data to make relationship prediction, thus discarding a lot of valuable information.In addition, methods such as BLMNII and MOLIERE only take relatively simple forms (such as bilinear linear or log-linear functions), which may not be sufficient to capture complex hidden features behind heterogeneous data.The reason for NGCN's excellent performance lies in its initial utilization of RWR to compute the diffusion state of nodes for each network, followed by its integration with clusDCA for dimensionality reduction operations.In this manner, the noise in the data is substantially reduced.To verify the performance of NGCN under sparse positive samples, we changed the number of samples and specified the proportion 1:10 for positive and negative examples.It is observed that the performance of all other algorithms decreased.In contrast, NGCN still achieved the best prediction performance.This shows that even in the case of sparse labelling, the prediction performance of other methods is still inferior to the NGCN method.In addition, we performed statistical significance tests at the 95% confidence level on the results of the NGCN and NeoDTI (the best performance method in the comparison experiment) using 10-fold cross-validation.The results show that the observed differences between the two methods are statistically significant.
Since the data may be redundant, for example, there are multiple homologous proteins for one protein or multiple highly similar drugs for one drug in the dataset, which may negatively affect the performance.Therefore, we applied the same strategy as Luo et al.
to reduce the impact of data redundancy by removing drug-target associations of similar drugs or targets in the drug-target interaction matrix.We eliminated drug-target associations in which the Jaccard similarity in the association network was greater than 0.6, the structure similarity score in a medicinal chemical similarity network exceeds 0.6, and the identity score in the protein-protein sequence similarity network exceeds 0.4.
In the experiment, we kept the ratio 1:1 for negative and positive samples.As expected, after the deletion of similarity, NGCN performance declined but was still superior to other baseline methods.

| Effects of NGCN components
In this paper, we propose a multi-network integration algorithm, termed as NGCN and apply it on drug-target interactions prediction using GCN model.We use GCN to aggregate neighbourhood features to further improve the availability of features.The spectralbased graph convolution network (GCN) method introduces filters from the perspective of graph signal processing to define graph convolution, where the graph convolution operation is interpreted as removing noise from the graph information.In order to evaluate the performance of GCN part, we implemented a multi-networks integration framework without updating features (i.e.use the spectralbased graph convolutional neural network for updating features), to evaluate the effects of the proposed NGCN.We compared our method, NGCN, with these various approaches to validate the effects of the feature updating operation, and the experimental results are reported in Table 1.The results show that the feature updating operation of our proposed NGCN algorithm demonstrates substantial superiority on the task of predicting drug-target interactions.

| Robustness
In the experiment, we mainly evaluated the influence of parameters and the robustness of NGCN.The robustness of NGCN was tested by changing the number of networks related to the drugs or target, the feature dimension and the hyperparameters of NGCN.All experimental results were obtained by adopting 10-fold cross-validation.
We start from examining the effects from aggregating multiple heterogeneous networks on the predicted results.We only used drug-protein association matrices (i.e.drug similarity network, drugdrug association network, protein-protein association networks, protein similarity network and drug-protein association network) to conduct performance evaluation.Through training, we observed that the prediction performance was significantly reduced compared to the original model, NGCN, which obtained the features from all networks.We also increased the number of networks associated with disease and side-effects.Under expectation, it is observed that the prediction performance could be improved by adding drug-and protein-related networks.Experiments show that aggregating heterogeneous information in the networks generated by multiple data sources is able to improve the prediction accuracy.Furthermore, we applied NGCN to predict drug-target interactions under different feature dimension conditions and compared the AUPR values of the predicted results.According to the experiment of Wang et al., 29 the dimension of the feature vector in the diffusion state dimension of 10%-20% achieved the best results.We expanded the scope of the study to 10% to 30%, and we set the drug dimension to 80, 110, 140, 170, 200 and protein dimension to 200, 250, 300, 350 and 400.
From the observations, there was little impact on the predicted results (see Figure 3).
We further investigated the impact of hyperparameters on experimental performance.Here, we mainly studied the influence of restart random walk probability p on the experimental results.In the test, we considered the restart probability value between 0.4 and 0.7 to observe the performance stability under different probabilities.In Figure 3, it can be seen that when the restart probability is varied from 0.4 to 0.7, NGCN achieves stable performance.
Thus, these parameters have little impact on the experimental performance.

TA B L E 1
Performance of drug-target interaction prediction under different settings (No. positive:No.negative = 1:1).
integrated a large number of unlabeled drug molecular map information and target information and designed a pre-training framework, MGP-DR(molecular graph pretraining for drug representation), for drug pair representation learning.The model used a self-supervised learning strategy to mine contextual information within and between drug molecules to predict drug-drug interactions and drug combinations.The graph convolutional neural network was

F I G U R E 2
Comparison between NGCN and related methods.We apply 10-fold cross-validation in our experiments and compare NGCN with six other prediction methods (including NeoDTI, DTINet, BLMNII, MOLIERE, NetLapRLS and HNM) in terms of prediction effects.The yaxis describes AUPRC for measuring prediction performance.(A) Specifying proportion 1:1 for positive and negative examples.(B) Specifying proportion 1:10 for positive and negative examples.(C-F) Several strategies to remove data redundancy: (C) Removing DTIs sharing similar drugs.(D) Deleting DTIs sharing similar diseases.(E) Deleting DTIs with drugs showing similar side effects.(F) Pruning DTIs with similar drugs or proteins.

F I G U R E 3
Robustness of NGCN.(A) Effects of aggregating multiple heterogeneous networks.(B) Effects of drug dimensions.(C) Effects of protein dimensions.(D) Effects of restart probability.

| 3 of 11 CAO et al.
d ≪ n.In this case, w T i x j is a low-dimensional approximation, and the next term log ∑ and w i describe the topology of the network, x i represents the node feature, and w i can be regarded as the context characteris- tics of node i.The clusDCA takes a set of observed diffusion states S = s 1 , … , s n as input, and uses the sum of squared errors as the objective function:

•
Step 1: the diffusion state S i for drug or target is derived by performing RWR algorithm (as shown in Equ 2) on each network.•Step 2: clusDCA takes the diffusion state set R c 1 = S 1 , … , S 4 of the drug and the diffusion state set R c 2 = S 5 , … , S 7 of the pro- • Step 4: the drug-target matrix Y rec is reconstructed by Equ 13, after obtaining the updated features H drug and H target .

: Pseudocode of NGCN
= S 1 , … , S 4 of the drugs and the diffusion state set R c 2 = S 5 , … , S 7 of the proteins; The spectral graph convolutional neural network is then employed to further ALGORITHM 1 The best performance results are highlighted in bold.